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Pipelined data processing system monitors identifier provided by 
functional unit and stores monitored result in buffer 

Patent Assignee: INT BUSINESS MACHINES CORP (IBMC ) 

Inventor: LEE H G; CHEONG H; LE H Q 

Number of Countries: 002 Number of Patents: 003 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 
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A history buffer is coupled to logic unit for receiving a 
target identifier of specific instruction . The buffer monitors 

identifier provided by functional unit, and stores the monitored 
result, when control signal output from functional unit is in first 
logical state. The control signal is switched to second logical state, 
to indicate that the monitored result is unavailable for storage 
within a register. 

DETAILED DESCRIPTION - The target identifier identifies a register 
to be accessed by the instruction. An INDEPENDENT CLAIM is also 
included for method for capturing data in data processing system. 

USE - For data processing. 

ADVANTAGE - Enables ensuring that data is captured in correct 
chronological order and avoids corruption of historical values of 
registers . 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
pipeline processor, 
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Computer system with cache system - has cache controller which 
directs cache memory to designate data which are returned by CPU to 
main memory to another address, in response to warning signal received 
from CPU 

Patent Assignee: TOSHIBA KK (TOKE ) 

Number of Countries: 001 Number of Patents: 001 

Patent Family : 

Patent No Kind Date Applicat No Kind Date Week 

JP 8328953 A 19961213 JP 95130457 A 19950529 199709 B 

Priority Applications {No Type Date) : JP 95130457 A 19950529 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 8328953 A 10 G06F-012/08 

Abstract (Basic) : JP 8328953 A 

The computer system has a CPU (1) which outputs a warning signal to 
indicate first memory data access enforcement or indirect addressing 
instruction. The CPU outputs an access control command to a main memory 
(4). A cache memory (3) is accessed by the CPU only when it holds the 
address of the data to be accessed in the main memory. 

The data are returned to the main memory when the accessed data are 
altered. A cache controller (2) controls the cache memory so that the 
returned data will be stored in a certain address. 

ADVANTAGE - Shortens time of memory access through cache memory; 
enables efficient memory access. 
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Cache apparatus for magnetic disk s\ib-system - controls access of disk 
units corresponding to preset RAID operating mode, when disk array is 
accessed by read and write cache control units 
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US 5835940 A 62 G06F-012/08 Div ex application US 94266737 

Abstract (Basic) : US 5835940 A 

The apparatus has a disk array with several disk units into which 
data read-write is performed. A cache memory stores part of data 
in the disk array. A cache management unit manages storing state of 
cache memory based on a hash table indicating contents of cache 
registration. An LRU table includes cache blocks having effective data 
that are sequentially linked, based on a registration order. The cache 
block which is used most recently is set as a leading head block. A 
read cache control unit reads data from the cache memory to a higher 
order apparatus, when a read request is received from the higher order 
apparatus. When relevant data is not found in the cache memory for 
transfer, data is read from the disk array to the higher order 
apparatus . 

A write cache control unit registers a management information when 
write request is received from the higher order apparatus. Data is 
written into the cache memory corresponding to the management 
information. A write back control unit extracts data which is not 
yet stored into the disk array from cache memory. The extracted data is 
written back, when predetermined write back conditions are 
satisfied. A disk array control unit controls access to several disk 
units corresponding to a preset RAID operating mode, when disk array is 
accessed by the read and write cache control unit. 

ADVANTAGE - Improves accessing speed. Prevents fault occurrence, 
thereby preventing loss of data. 
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SRAM - in which specific data requested by processor is provided to 
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Number of Countries: 001 Number of Patents: 001 
Patent Family: 
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Abstract (Basic) : US 5781926 A 

The system includes a main memory for storing data used by a 
processor (502). A cache memory (504) connected to the main memory and 
processor also stores data used by the processor. The cache memory 
comprises a cache memory array which has multiple cache lines (528). 
Each of the cache lines has multiple sub-cache line locations 
(530,532,534,536) for storing data. A cache directory (506) stores tag 
information associated with the data stored in the cache memory array. 
A cache enable unit provides multiple sub-cache line enable bits 
associated with the sub-cache line location in any one of the cache 
line, for allowing sub- cache lines location to be directly written 
into main memory without requiring intermediate storage in a cache 
buffer . 

The cache enable unit has a cache logic (508) for reading tag 
information and the sub- cache line enable bit to determine whether 
specific data requested by processor is stored in the cache memory 
array. The cache enable logic is also used for providing the 
specific data to the processor when the cache line is not 
completely filled. 

ADVANTAGE - Avoids use of pipeline stall for filling instruction 
queue, thereby improving performance. Reduces processing time by 
allowing access to sub-cache lines before filling entire cache line. 
Updates several sub cache line enable bits within short time. 
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Patent No Kind Lan Pg Main IPC Filing Notes 
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Abstract (Basic) : EP 718758 A 

The cache control unit has a content addressable memory for storing 
most recent addresses of cache lines which have been accessed for 
instruction fetching for execution, the instructions being of variable 
length. Boundary identification logic is responsive to the content 
addressable memory for identifying instruction boundaries for each of 
the number of instructions held in cache. 

An anticipation buffer holds a next instruction, the next 
instruction being located and identified by the content addressable 
memory and the boundary identification logic. 

ADVANTAGE - Effective mechanism of minimised complexity to permit 
high speed out-of-order instruction execution in microprocessor 
architectures that permit store-to-the-instruction stream operations. 
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ABSTRACT 

PROBLEM TO BE SOLVED: To prevent an overhead while considering the 
reference frequency of a data block at the time of replace. 

SOLUTION: The cache memory controller of a set associative system is 
provided with a cache memory of plural ways installed in a processor so as 
to be used for exchanging data between a register and a main storage device 
and a control means for controlling this cache memory. In this case, a 
cache tag 10 to be stored in the cache memory is composed of a data 
part 11 for storing data, an address tag part 13 for storing the 
address of data, an LRU bit part 15 for storing information for grasping 
whether these data are just recently referred to or not and access 
frequency bit part 17 for storing information for grasping the reference 
frequency of referring to the data and the information of the said access 
frequency bit part is considered for a replace algorithm. 
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ABSTRACT 

PURPOSE: To provide a cache device which can leave specific data 
preferentially in a cache . 

CONSTITUTION: Each entry of the cache is provided with a reuse bit for 
preferentially inhibiting data from being expelled and when only one reuse 
bit (404) of two sets of entries is true as shown in the figure, the other 
entry is selected as an expelled entry. When the reuse bits of both the 
sets are true or false, an entry whose recent bit 402 is false is selected. 
The instruction word of a load instruction is provided with a bit 

indicating whether or not there is REUSE designation and when this bit is 
true, the reuse bit (404) of a cache entry including data accessed by the 
load instruction becomes true. 
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22, 1990 (19900522) 

ABSTRACT 

PURPOSE: To shorten the waiting time of an MPU to improve the throughput of 
a system by providing a flag bit group indicating whether internal 
data is effective or not correspondingly to a block buffer . 

CONSTITUTION: A block buffer 18 where one-block components of data can 
be stored at a time is provided between an internal data bus 17b of a 
storage device, for example, a cache memory 1 and a data memory part 
12, and the unit of data transfer between the cache memory and the data 
memory part 12 is larger than that between the cache memory 1 and a 
main memory. A flag bit group in a register 19 is provided which 

corresponds to respective data stored in the block buffer 18 and 
indicates whether data is effective/ ineffective, and a decoder 19a is 
provided which can simultaneously select plural flag bits in accordance 
with the transfer data width at the time of terminating data transfer from 
the block buffer 18 to the data memory 12, and flag bits which are 
set one by one are simultaneously cleared. 
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ABSTRACT 

PURPOSE: To buffer storage length data of optional length by stopping 
data reading operation from a buffer memory when a final data flag bit 
is detected. 



CONSTITUTION: When data is written on a magnetic tape device (MTU) 5 from a 
channel device (CH)3, an input/output controller 4 receives a write command 
and sets the next address of a final data address in a buffer memory 
address register 29. Then, CH3 sends data, which is stored in the buffer 
memory; 23. When final data of one record length is sent, '1' is sent as a 
final data indica tion signal together with the data. Consequently, the 
final data flag in the buffer memory is set to *1*. When the buffer 
memory 23 becomes full, reading opera tion is started and one transfer data 
is read out of the buffer memory 23. At this time, only the data part 
in a B data register 24 is sent to MTU5 . Then, when the final data flag is 
'1', transfer to MTU5 is finished. 
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Abstract (Basic) : RD 435104 A 

NOVELTY - In a multi-tasking data processing system each task is 
assigned its own cache store. Each task can start without flushing 
any other task's cache and during task switching no write - back is 
required. Each task has its own identification number set by the 
operating system that selects the appropriate cache for the current 
task. All non-selected areas of cache can be left inactive to save 
power consumption. 

USE - In multi-tasking data processing systems. 

ADVANTAGE - Reduces cache charge/discharge overhead and makes 
possible cache swapping, compiler directed pre-charging and dynamic 

cache sizing. The cost of cache stores can be reduced as can power 
consumption . 
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Abstract (Basic) : EP 718758 A 

The cache control unit has a content addressable memory for storing 
most recent addresses of cache lines which have been accessed for 
instruction fetching for execution, the instructions being of variable 
length. Boundary identification logic is responsive to the content 
addressable memory for identifying instruction boundaries for each of 
the number of instructions held in cache. 

An anticipation buffer holds a next instruction, the next 
instruction being located and identified by the content addressable 
memory and the boundary identification logic. 

ADVANTAGE - Effective mechanism of minimised complexity to permit 
high speed out-of-order instruction execution in microprocessor 
architectures that permit store-to-the-instruction stream operations. 
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A linked list of are stored identifying of data in the cache memory 
which have been written-to in response to at least one write command 
from the host processor. One entry in the linked list corresponds to 
each segment in the cache memory that has been written to, the control 
unit is responsive to write commands from the host processor for making 
an entry in the linked list the first time a segment of data in the 

cache memory is written to, the entries being linked in the order in 
which their corresponding segments are first written to. A trickle 
device passes segments of data from the cache memory to the bulk 
memory. 

The trickle device is responsive to the in the linked list for 
trickling segments of data to the bulk memory in the same order as 
their corresponding entries were made in the linked list. Replacement 
of the segments in cache memory is accomplished in order from the least 
recently used to the most recently used and trickling of segments is 
accomplished in the order of age since first being written to, the 
oldest written-to segment being trickled first. 
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ABSTRACT 

PROBLEM TO BE SOLVED: To make the image forming speed higher while 
suppressing the rise in the cost of a product having an image generating 
device . 



SOLUTION: When a cache access from a drawing means 10 to a cache system 20 
has been generated, an image data control part 22-1 makes a hit/miss 
judgement on specific data in a cache array 21. When a miss is 

judged, the value of a write - back management bit regarding a requested 
cache class is confirmed. When the value of the write - back management 
bit is '1', a load request to a frame buffer 30 is issued. When the value 
of the write - back management bit is '0', on the other hand, a cache 
line is automatically generated by a line inside generation part 22-2 
according to a background pattern. 
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ABSTRACT 

keep the consistency of contents in each cache memory with a 
re in the case of cache memory control for closely coupled 
s . 



CONSTITUTION: Each processor is provided with an instruction execution 
part 201, copy back type cache memory 202 and bus interface part 
203 and when it is standby state since there is not bus use right at the 
time of writing data held in the cache memory 202 back to a main memory, a 
system bus 4 is monitored by a monitor circuit 204. When it is detected 
that data on the main memory in the same line as the write back object 
are reloaded by the other processor, the bus interface part 203 cancels a 
bus access wait state corresponding to a cancel signal 207, cancels a 
write back request to the bus interface 203 of the copy back type 

cache memory 202 corresponding to a cancel signal 208 and cancels the 
external access of the instruction execution part 201 corresponding to the 
cancel signal 208. 
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ABSTRACT 

PURPOSE: To efficiently execute holding of the consistency between each 
write - through cache by providing a consistency holding part having a 

bus monitoring part , a data fetch part and a cache data changing 
part against each write - through cache . 

CONSTITUTION: A bus monitoring part 131 or 231 in a consistency holding 
part 13 or 23 in a central processing unit 1 or 2 executes such a 
processing as shown hereinbelow. By always monitoring a memory bus 4, 
whether a write instruction to a main storage device 3 is issued or not is 
decided. As a result of this decision, in the case the write instruction to 
the device 3 is issued, whether a write destination address related to its 
write instruction is contained in a memory block held by a write - through 
cache 12 or not is decided. As a result of this decision, in the case the 
write destination address related to its write instruction is contained in 
the cache 12, it is requested to fetch write data (data related to the 
write instruction concerned) from the memory bus 4, to a data fetch part 
132 (write data fetch request is executed) . 
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PURPOSE: To 
2nd groups to 



ABSTRACT 

reduce the overhead loss by using the instructions of 1st and 
compile a software program. 



CONSTITUTION: An instruction prefetch control part 6 gives a request to 
a 1st system instruction cache control part 311 to take an 

instruction out of a software program 12 through an address included in a 
main storage storing a 1st group instruction following a 2nd group 

instruction . Thus the part 6 retrieves a hit state. When a hit state is 
confirmed, the data are immediately stored in an instruction register 5 
from a 1st system instruction cache 310. Then the execution of an 
instruction program is started. In a mishit state, however, the data are 
read out of the storage 1 via a main storage access control part 2 under 
the control of the part 311. Then the date are loaded in a block of the 

cache 310 and also in an instruction register 5, and the instruction of 
the instruction program is carried out. In such a constitution, the 
overhead loss can be reduced. 
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CODEN: SCJAEP ISSN: 0882-1666 

Language: English 

Document Type: JA; (Journal Article) Treatment: T; (Theoretical) 
Journal Announcement: 9712W2 

Abstract: Cache memory not only allows one to decrease average memory 
access time, but also relieves bus traffic and decreases bus latency. In 
this paper, to further relieve bus traffic, a write - back cache memory 
(DMC, Decoupled Modified-bit Cache ) is proposed that provides data 
modification at byte level . DMC supports selective write - back of only 
modified data to memory which contributes to further relief of bus traffic. 
To avoid considerable requirements for additional hardware in implementing 
DMC, a method is proposed to separate status bits that indicate data 
modification from the cache , allocating them as necessary. Benchmark 
tests with a variety of applications were performed to validate DMC. The 
results show that, with an additional 3% of the cache memory allocated as 
memory cells for status bits, memory usage intensity and data flow through 
the bus are reduced by about 35% and 10%, respectively, when compared to a 
conventional write - back cache . (Author abstract) 11 Ref s . 
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Publication Year: 1992 
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Document Type: CA; (Conference Article) Treatment: T; (Theoretical); A; 
(Applications ) 

Journal Announcement : 9309W2 

Abstract: This paper describes a complete, high-performance, second- 
level cache system for a 33-MHz 80486 that is optimized for both high 
performance and low component in a single PLD that controls a 128-Kbyte, 
direct -mapped, write - through cache that supplements the 486 's on-chip 
8-Kbyte cache . The overall system can be divided into four main sections: 
the CPU, the cache , the cache tags , and the cache controller. The 
CPU is a 33-MHz Intel 80486; the 128-Kbyte cache consists of four 44-pin, 
synchronous, 32K multiplied by 9 cache RAMs; the cache tag consists 
of two CY7B181 4K multiplied by 18 cache tags ; the cache controller 
is a single CY7C344 PLD. The entire system was simulated and correct 
operation verified using Viewlogic's simulation tools. The first part of 
this document discusses the four main system blocks, starting with the 
memory interface to the 486. Following that, the cache controller 
implementation is described. The last section of the paper describes the 
simulation method used to verify correct system operation. (Author 
abstract) 1 Ref. 
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Document Type: PA; (Conference Paper) Treatment: X; (Experimental) 
Journal Announcement: 9103 

Abstract: The interactions between a cache *s block size, fetch size, 
and fetch policy from the perspective of maximizing system- level 
performance are explored. It has been previously noted that, given a simple 
fetch strategy, the performance optimal block size is almost always four or 
eight words. If there is even a small cycle time penalty associated with 
either longer blocks or fetches, then the performance optimal size is 
noticeably reduced. In split cache organizations, where the fetch and 
block sizes of instruction and data caches are all independent design 
variables, instruction cache block size and fetch size should be the 
same. For the workload and write - back write policy used in this 
trace-driven simulation study, the instruction cache block size should be 
about a factor of 2 greater than the data cache fetch size, which in turn 
should be equal to or double the data cache block size. The simplest 
fetch strategy of fetching only on a miss and stalling the CPU until the 
fetch is complete works well. Complicated fetch strategies do not produce 
the performance improvements indicated by the accompanying reductions in 
miss ratios because of limited memory resources and a strong temporal 
clustering of cache misses. For the environments simulated, the most 
effective fetch strategy improved performance by between 1.7% and 4.5% over 
the simplest strategy described above. 21 Refs. 
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Abstract: In some computer subsystem architectures, all or part of the 
data from a direct-access storage device is temporarily stored before 
being transmitted to the system. To assure data integrity. Cyclic 
Redundancy Code (CRC) check bits are typically appended at the end of 
the data field by the transmitting end, and stored at the receiving end. If 
errors are detected, an attempt is made to correct them before sending them 
to the channel. However, once any part of the field is changed because of 
correction or updating, the CRC bits are no longer valid, and have to be 
updated to reflect the changed data. In accordance with the new method 
described, CRC check bits for data which have been updated are themselves 
updated without having to clock the data through a CRC generator. As the 
length of the data field becomes relatively large, improvement in execution 
time becomes more significant. 
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Abstract: Cache memory has played a significant role in the memory 
hierarchy and has been used extensively in large systems and minisystems. 
The effectiveness of cache memories with alternative main memory update 
policies in a multiprocessor system is a major concern of this study. The 
performances of write - through with write-allocation or no-write 
allocation, buffered write - through , flag -swap, and buffered flag 
-swap policies have been analyzed. Because of the dominating cost of the 
interface between processors and main memory modules in the multiprocessor 
system, the effect of varying the bus width or block size has also been 
considered. Queuing models were developed to analyze these alternative 
organizations, and results predicted by the models were validated by a set 
of simulations. 30 refs. 
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The widening gap between the speed of processors and main memory has 
encourage the use of cache memories to reduce the average memory access 
time by exploiting the spatial and temporal locality of memory references 
in a program. In shared-memory multiprocessors, however, data sharing 
reduces the effectiveness of cache memories by introducing the cache 
coherence problem and the inherent coherence overhead. The many solutions 
to the cache coherence problem attempt to reduce the coherence overhead 
by using different mechanism to enforce coherence, to detect accesses to 
stale data, and to reduce the impact of false-sharing. In this thesis, we 
examine the performance effect of these different mechanisms. We discuss 
how the coherence overhead depends on the sharing characteristics of a 
program and define a model for characterizing the sharing behavior of 
parallel programs . We also propose and evaluate a partially-valid write - 
through write-merged coherence mechanism to tolerate false-sharing. This 
mechanism by itself does not entirely eliminate redundant write - through 
, however, so we also define a compiler algorithms to eliminate some of 
the redundant write - through to memory. 

Using simulations and our model, we identify the high processor-memory 
network traffic as the main problem associated with an updating coherence 
enforcement strategy, and the high number of misses as the main problem 
associated with an invalidating coherence enforcement strategy. We also 
show that the severe performance penalty resulting from inaccurate compile 
-time analysis in a software-only coherence detection mechanism favors the 
use of a hardware-only or a combined hardware-software mechanism. Our study 
of false-sharing indicates that the performance penalty of false-sharing 
is severe when a program has a large amount of sharing and a fine 
granularity of sharing. Our studies of a partially-valid write - through 
cache with and without a merging write- buffer emphasize the difficulty 
of using a hardware mechanism to adjust to the programs' sharing 
characteristics. Consequently, we address this difficulty using compile 
-time analysis in conjunction with a hardware mechanism to enhance 
performance . 
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High performance processor organizations place large demands on the 
data memory hierarchy . The data bandwidth requirements of a processor can 
be a serious performance constraint. As a result, cache memory is a very 
important element in the implementation of a high performance computer. 
Caching a subset of main memory in faster memory provides the processor 
with high bandwidth and low latency data. The utility of caching is 
determined by the cache size, organization, and speed. 

The focus of this dissertation is to improve the performance of a data 

cache by the addition of small specialized caches . By analyzing the 
spatial and temporal locality in the data reference stream at various 
points in the data memory hierarchy , we have developed a mixture of 
specialized caches to improve the data memory hierarchy . 

We have developed and analyzed write caches , tag caches , subword 

caches , fetch caches , and a two- level windowed register file. Each 
specialized cache reduces the bandwidth and/or latency demands on the 
data memory hierarchy . Write caches reduce write - through cache 
traffic. Tag caches reduce tag checks to write - back caches . 
Subword caches enhance the benefit of removing the subword hardware from 
the primary cache access path. Fetch caches reduce read latency or 
increase read bandwidth. A two- level windowed register file provides 
large capacity without impacting cycle time. Thus, a mixture of specialized 

caches has been developed to improve the data memory hierarchy . 
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The purpose of this work is to search for hardware organizations that 
might speed up memory referencing for block-structured languages such as 
Pascal. Our hypothesis is that building basic knowledge about execution 
stacks into the hardware will improve this referencing. The problem divides 
into two parts: efficient calculation of memory addresses and efficient 
movement of data between memory and CPU. A stack address generator attacks 
the first problem. Various schemes for caching data attempt to address 
the second: varied cache -write policies (CPU writing cache in the 
absence of a tag match) , a program prefetch buffer , a top of stack 
minicache that limits writethroughs of temporaries, and a two-access cache 

than can both be fetched from and stored to in a single microcycle. 
Several designs are proposed, microcoded to emulate Pascal P-code, and 
their performances compared on twenty-two Pascal test programs. 

The addition of the stack address generator results in a 40 to 55% 
speedup. Varying the cache -write policy has little effect. Program 
prefetching shortens the microcode so much that CPU waits for memory 
contention and bus protocols are a significant problem. A writeback 
policy would help this; the minicache which creates a writeback policy 
only for top of stack temporaries reduces much of the contention. Stack 
registers were not very beneficial when prefetching was not used. The 
minicache, program prefetching, and the two-access cache ' together give an 
additional 40 to 50% speedup. However, in the microcode the top of stack is 
either the source or destination of all two-accesses. Hence the two-access 
performance can be attained by merely providing the top of stack minicache 
the ability to communicate with the main cache within a single microcyle. 
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Abstract: The paper proposes a novel cache -memory design for soft-error 
silence, and verifies the design through a simulation that uses realistic 
system and software model (J.L. Hennessy and D.A. Patterson, 1996) . The SSI 
design is a combination of an n-bit error detector and a fast circuit that 
allows real-time-forced invalidation of corrupted data sets. The current 
design supports the write - through caching policy and will be extended 
for the write - back policy. To verify the effectiveness of the proposed 
design approach, mixed-mode simulations are conducted that insert 
soft-errors ( bit -flips) into the cache -memory model. The simulation 
commences while running different class of programs (ALU-intensive and 
branch-intensive programs) designed to stress several key functions of the 
target system. System- level failure modes triggered by the soft errors 
are observed and all inserted errors are recovered by SSI scheme. The 
performance and layout-area overheads are also quantified. The worst-case 
performance/time overhead of the SSI scheme is approximately 9% while the 
lay-out-area overhead is less than 0.5%. (19 Refs) 
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A caching method in which modifications to data in the cache aren't copied to 
the cache source until absolutely necessary. Write-back caching is available on 
many microprocessors, including all Intel processors since the 80486. With 
these microprocessors, data modifications (e.g., write operations) to data stored 
in the LI cache aren't copied to main memory until absolutely necessary. In 
contrast, a write-through cache performs all write operations in parallel - data 
is written to main memory and the LI cache simultaneously. 

Write-back caching yields somewhat better performance than write-through 
caching because it reduces the number of write operations to main memory. 
With this performance improvement comes a slight risk that data may be lost if 
the system crashes. 
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A write-back cache is also called a copy-back cache. 
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